WC 2021

Program at a Glance

10th World Congress in Probability and Statistics

Plenary Thu-1: IMS Medallion Lecture (Daniela Witten) Plenary Thu-2: IMS Medallion Lecture (Andrea Montanari) Plenary Thu-3: Blackwell Lecture (Gabor Lugosi) Plenary Thu-4: Tukey Lecture (Sara van de Geer)

Invited 38: IMS Lawrence D. Brown Ph.D. Student Award Session (Organizer: Institute of Mathematical Statistics)

Invited 09: Quantum Statistics (Organizer: Cristina Butucea) Invited 22: Random Trees (Organizer: Anita Winter) Invited 29: High Dimensional Data Inference (Organizer: Florentina Bunea) Invited 34: Random Walks on Random Media (Organizer: Alexander Drewitz) Invited 37: Bernoulli Society New Researcher Award Session (Organizer: Bernoulli Society)

Invited 11: Analysis of Dependent Data (Organizer: Chae Young Lim) Invited 19: Randomized Algorithms (Organizer: Devdatt Dubhashi) Invited 23: Stochastic Partial Differential Equations (Organizer: Leonid Mytnik) Invited 26: Pathwise Stochastic Analysis (Organizer: Hendrik Weber)

Organized 11: Random Growth, Spatial Processes and Related Models (Organizer: Erik Bates) Organized 19: Recent Advances in Complex Data Analysis (Organizer: Seung Jun Shin) Organized 25: Recent Advances in Biostatistics (Organizer: Sangwook Kang) Organized 31: BOK Contributed Session: Finance and Contemporary Issues (Organizer: BOK Economic Statistics Department)

Organized 08: Rough Path Theory (Organizer: Ilya Chevyrev) Organized 21: Recent Advances in Statistics (Organizer: Yunjin Choi)

Organized 10: Random Conformal Geometry and Related Fields (Organizer: Nam-Gyu Kang) Organized 30: Stochastic Adaptive Optimization Algorithms and their Applications to Neural Networks (Organizer: Miklos Rasonyi & Sotirios Sabanis)

Contributed 09: Topics Related to RMT Contributed 11: Topics Related to KPZ Universality Contributed 21: Dimension Reduction and Model Selection Contributed 35: Financial Data Analysis

Contributed 04: Stochastic Processes and Related Topics Contributed 17: Various Limit Theorems Contributed 32: Statistical Modeling and Prediction

Poster III-1: Poster Session III-1

Invited Session (live Q&A at Track 1, 11:30AM KST)

Invited 38

IMS Lawrence D. Brown Ph.D. Student Award Session (Organizer: Institute of Mathematical Statistics)

Conference

11:30 AM — 12:00 PM KST

Local

Jul 21 Wed, 7:30 PM — 8:00 PM PDT

Efficient manifold approximation with spherelets

Didong Li (Princeton University / University of California)

Data lying in a high dimensional ambient space are commonly thought to have a much lower intrinsic dimension. In particular, the data may be concentrated near a lower-dimensional subspace or manifold. There is an immense literature focused on approximating the unknown subspace, and in exploiting such approximations in clustering, data compression, and building of predictive models. Most of the literature relies on approximating subspaces using a locally linear, and potentially multiscale, dictionary. In this talk, a simple and general alternative is introduced, which instead uses pieces of spheres, or spherelets, to locally approximate the unknown subspace. Theory is developed showing that spherelets can produce lower covering numbers and MSEs for many manifolds. We develop spherical principal components analysis (SPCA). Results relative to state-of-the-art competitors show gains in ability to accurately approximate the subspace with fewer components. In addition, unlike most competitors, our approach can be used for data denoising and can efficiently embed new data without retraining. The methods are illustrated with standard toy manifold learning examples, and applications to multiple real data sets.

Toward instance-optimal reinforcement learning

Ashwin Pananjady (Georgia Institute of Technology)

The paradigm of reinforcement learning has now made inroads in a wide range of applied problem domains. This empirical research has revealed the limitations of our theoretical understanding: popular RL algorithms exhibit a variety of behavior across domains and problem instances, and existing theoretical bounds, which are generally based on worst-case assumptions, can often produce pessimistic predictions. An important goal is thus to develop instance-specific analyses that help to reveal what aspects of a given problem make it "easy" or "hard", and allow distinctions to be drawn between ostensibly similar algorithms. Taking an approach grounded in nonparametric statistics, we initiate a study of this question for the policy evaluation problem. We show via information-theoretic lower bounds that many popular variants of stochastic approximation or "temporal difference learning" algorithms *do not* exhibit the optimal instance-specific performance in the finite-sample regime. On the other hand, making careful modifications to these algorithms does result in automatic adaptation to the intrinsic difficulty of the problem. When there is function approximation involved, our bounds also characterize the instance-optimal tradeoff between approximation and estimation errors in solving projected fixed-point equations, a general class of problems that includes policy evaluation as a special case. These oracle inequalities, which are non-standard and involve a non-unit pre-factor multiplying the approximation error, may be of independent statistical interest.

Bayesian pyramids: identifying interpretable discrete latent structures from discrete data

Yuqi Gu (Columbia University)

High dimensional categorical data are routinely collected in biomedical and social sciences. It is of great importance to build interpretable models that perform dimension reduction and uncover meaningful latent structures from such discrete data. Identifiability is a fundamental requirement for valid modeling and inference in such scenarios, yet is challenging to address when there are complex latent structures. In this article, we propose a class of interpretable discrete latent structure models for discrete data and develop a general identifiability theory. Our theory is applicable to various types of latent structures, ranging from a single latent variable to deep layers of latent variables organized in a sparse graph (termed a Bayesian pyramid). The proposed identifiability conditions can ensure Bayesian posterior consistency under suitable priors. As an illustration, we consider the two-latent-layer model and propose a Bayesian shrinkage estimation approach. Simulation results for this model corroborate identifiability and estimability of the model parameters. Applications of the methodology to DNA nucleotide sequence data uncover discrete latent features that are both interpretable and highly predictive of sequence types. The proposed framework provides a recipe for interpretable unsupervised learning of discrete data, and can be a useful alternative to popular machine learning methods.

Q&A for Invited Session 38

This talk does not have an abstract.

Session Chair

Tracy Ke (Harvard University)

Program at a Glance